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We consider Hermitian and symmetric random band matrices H in d ^ 1 dimensions. The matrix 
elements H xy , indexed by x, y £ A C Z d , are independent, uniformly distributed random variables if 
| a; — y\ is less than the band width W, and zero otherwise. We prove that the time evolution of a 
quantum particle subject to the Hamiltonian H is diffusive on time scales t <C W d ' 3 . We also show that 
the localization length of the eigenvectors of H is larger than a factor W d ^ 6 times the band width. All 
results are uniform in the size |A| of the matrix. 
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1. Introduction 

The general formulation of the universality conjecture for disordered systems states that there are two 
distinctive regimes depending on the energy and the disorder strength. In the strong disorder regime, the 
eigenfunctions are localized and the local spectral statistics are Poisson. In the weak disorder regime, the 
cigenfunctions are delocalized and the local statistics coincide with those of a Gaussian matrix ensemble. 

Random band matrices are natural intermediate models to study eigenvalue statistics and quantum 
propagation in disordered systems as they interpolate between Wigner matrices and random Schrodinger op- 
erators. Wigner matrix ensembles represent mean-field models without spatial structure, where the quantum 
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transition rates between any two sites are i.i.d. random variables with zero expectation. In the celebrated 
Anderson model [5] , only a random on-site potential V is present in addition to a short range deterministic 
hopping (Laplacian) on a graph that is typically a regular box in Z d . 

For the Anderson model, a fundamental open question is to establish the metal-insulator transition, i.e. 
to show that in d > 3 dimensions the eigenf unctions of — A + XV are delocalized for small disorder A. The 
localization regime at large disorder or near the spectral edges has been well understood by Frohlich and 
Spencer with the multiscale technique [29,30], and later by Aizenman and Molchanov by the fractional 
moment method [3]; many other works have since contributed to this field. In particular, it has been 
established that the local eigenvalue statistics are Poisson [38] and that the eigenfunctions are exponentially 
localized with an upper bound on the localization length that diverges as the energy parameter approaches 
the presumed phase transition point [15,43]. 

The progress in the derealization regime has been much slower. For the Bethe lattice, corresponding 
to the infinite-dimensional case, derealization has been established in [4,27,35]. In finite dimensions only 
partial results are available. The existence of an absolutely continuous spectrum (i.e. extended states) has 
been shown for a rapidly decaying potential, corresponding to a scattering regime [8,10,39]. Diffusion has 
been established for a heavy quantum particle immersed in a phonon field in d ^ 4 dimensions [28]. For 
the original Anderson Hamiltonian with a small coupling constant A, the eigenfunctions have a localization 
length of at least A~ 2 (see [9]). The time and space scale A -2 corresponds to the kinetic regime where the 
quantum evolution can be modelled by a linear Boltzmann equation [24,45]. Beyond this time scale the 
dynamics is diffusive. This has been established in the scaling limit A — > up to time scales t <~ A~ 2 ~ K with 
an explicit n > in [18-20]. There are no rigorous results on the local spectral statistics of the Anderson 
model, but it is conjectured - and supported by numerous arguments in the physics literature, especially 
by supersymmetric methods (see [14]) - that the local correlation function of the eigenvalues of the finite 
volume Anderson model follows the GOE statistics in the thermodynamic limit. 

Due to their mean-field character, Wigner matrices are simpler to study than the Anderson model and 
they are always in the derealization regime. The complete derealization of the eigenvectors was proved 
in [21]. The local spectral statistics in the bulk are universal, i.e. they follow the statistics of the corresponding 
Gaussian ensemble (GOE, GUE, GSE), depending on the symmetry type of the matrix (see [37] for explicit 
formulas). For an arbitrary single entry distribution, bulk universality has been proved recently in [17,22,23] 
for all symmetry classes. A different proof was given in [46] for the Hcrmitian case. 

Random band matrices H = {H xy } XtVe r represent systems on a large finite graph T with a metric. The 
matrix elements between two sites, x and y, are independent random variables with a variance a xy := E\H xy \ 2 
depending on the distance between the two sites. The variance typically decays with the distance on a 
characteristic length scale W, called the band width of H. This terminology comes from the simplest one- 
dimensional model where the graph is a path on N vertices, labelled by V = {1, 2, . . . , N}, and the matrix 
elements H xy vanish if \x — y\ ^ W. If W = N and all variances are equal, we recover the usual Wigner 
matrix. The case W = O(l) is a one-dimensional Anderson- type model with random hoppings at bounded 
range. Higher-dimensional models are obtained if the graph T is a box in Z d . For more general random band 
matrices and for a systematic presentation, see [44]. 

Since the one-dimensional Anderson-type models are always in the localization regime, varying the band 
width W offers a possibility to test the localization-delocalization transition between an Anderson-type 
model and the Wigner ensemble. Numerical simulations and theoretical arguments based on supersymmetric 
methods [31] suggest that the local eigenvalue statistics change from Poisson, for W <C A 1 / 2 , to GOE (or 
GUE), for W » N 1 / 2 . The eigenvectors are expected to have a localization length £ of order W 2 . In 
particular the eigenvectors are fully delocalized for W ^> A 1 / 2 . In two dimensions the localization length is 
expected to be exponentially large in W; see [1]. In accordance with the extended states conjecture for the 
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Anderson model, the localization length is expected to be macroscopic, t <~ N, independently of the band 
width in d ^ 3 dimensions. 

Extending the techniques of the rigorous proofs for Anderson localization, Schenker has recently proved 
the upper bound I < W 8 for the localization length in d = 1 dimensions [40]. In this paper we prove a 
counterpart of this result from the side of derealization. More precisely, we show a lower bound I ^ W 1+d ^ 6 
for the eigenvectors of <i-dimensional band matrices with uniformly distributed entries. We remark that the 
lower bound I W was proved recently in [25] for very general band matrices. 

On the spectral side, we mention that, apart from the semicircle law (see [2,25,33] for d = 1 and [11] for 
d = 3) , the question of bulk universality of local spectral statistics for band matrices is mathematically open 
even for d = 1. In the spirit of the general conjecture, one expects GUE/GOE statistics in the bulk for the 
derealization regime, W S> A 1 / 2 . The GUE/GOE statistics have recently been established [25] for a class of 
generalized Wigner matrices, where the variances of different matrix elements are not necessarily identical, 
but arc of comparable size, i.e. EliJ^I 2 ~ E\H x > y i | 2 ; in particular, the band width is still macroscopic 
(W~N). 

Supersymmetric methods offer a very attractive approach to study the derealization transition in band 
matrices but the rigorous control of the functional integrals away from the saddle points is difficult and it 
has been performed only for the density of states [11]. Effective models that emerge near the saddle points 
can be more accessible to rigorous mathematics. Recently Disertori, Spencer and Zirnbauer studied a related 
statistical mechanics model that is expected to reflect the Anderson localization and derealization transition 
for real symmetric band matrices. They proved a quasi-diffusive estimate for the two-point correlation 
functions in a three dimensional supersymmetric hyperbolic nonlinear sigma model at low temperatures [13]. 
Localization was also established in the same model at high temperatures [12]. 

We also mention that band matrices are not the only possible interpolating models to mimic the metal- 
insulator transition. Other examples include the Anderson model with a spatially decaying potential [8, 34] 
and a quasi one-dimensional model with a weak on-site potential for which a transition in the sense of local 
spectral statistics has been established in [6,47]. 

A natural approach to study the derealization regime is to show that the quantum time evolution is 
diffusive on large scales. We normalize the matrix entries so that the rate of quantum jumps is of order 
one. The typical distance of a single jump is the band width W. If the jumps were independent, the typical 
distance travelled in time t would be Wy/i. Using the argument of [9], we show that a typical localization 
length I is incompatible with a diffusion on spatial scales larger than I. Thus we obtain I > W\fi, provided 
that the diffusion approximation can be justified up to time t. 

The main result of this paper is that the quantum dynamics of the d-dimensional band matrix is given 
by a superposition of heat kernels up to time scales t -C W d ^ 3 . Although diffusion is expected to hold up to 
time t ~ W 2 for d = 1 and up to any time for d ^ 3 (assuming the thermodynamic limit has been taken), 
our method can follow the quantum dynamics only up to t <C W d ^ 3 . The threshold exponent d/3 originates 
in technical estimates on certain Feynman graphs; going beyond the exponent d/3 would require a further 
rcsummation of certain four- legged subdiagrams (see Section 11). 

Finally we remark that our method also yields a bound on the largest eigenvalue of a band matrix; see 
Theorem 3.4 in the forthcoming paper [16] for details. 

Acknowledgements. The problem of diffusion for random band matrices originated from several discussions 
with H.T. Yau and J. Yin. The authors are especially grateful to J. Yin for various insights and for pointing 
out an improvement in the counting of the skeleton diagrams. 
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2. The Setup 



Let the dimension d ^ 1 be fixed and consider the cZ-dimensional lattice Z equipped with the Euclidean 

norm |-| Z d (any other norm would also do). We index points of Z d with x,y,z, Let W > 1 denote a 

large parameter (the band width) and define 

M = M(W) := |{ieZ d : H \Ai* < , 

the number of points at distance at most from the origin. In the following we tacitly make use of the 
obvious relation M <~ CW d . For notational convenience, we use both W and M in the following. 

In order to avoid dealing with the infinite lattice directly, we restrict the problem to a finite periodic 
lattice An of linear size N. More precisely, for N £ N, we set 

Aw := {-{N/2],...,N-l~[N/2}} d C Z d , 

a cube with side length AT centred around the origin. Here [•] denotes integer part. We regard A^ as periodic, 
i.e. we equip it with periodic addition and the periodic distance 

\x\ := inf{|ir + Nu\ zd : v £ Z d } . 

Unless otherwise stated, all summations J2 X are understood to mean ^2 x£An ■ 

We consider random matrices H w = H whose entries H xy are indexed by x,y £ A^r. Here uj denotes 
the running element in probability space. The large parameter of the model is the band width W. We shall 
always assume that A^ ^ WM 1 ' 6 . Under this condition all our results hold uniformly in N. 

We assume that H is either Hcrmitian or symmetric. The entries H xy satisfying 1 ^ \x — y\ < W are 
i.i.d. (with the obvious restriction that H yx = H xy ). In the Hermitian case they are uniformly distributed 
on a circle of appropriate radius in the complex plane, 

H xy ~ -^L^UniftS 1 ), 1 \x-y\^ W. (2.1a) 
In the symmetric case they are Bernoulli random variables, 

Hxy = 7W = r) = l' K\x-y\<w. (2.ib) 

If \x — y\ £ [1, W] then H xy = 0. An important consequence of our assumptions (2.1a) and (2.1b) is 

\H xy \ 2 - ^l(K|x-!/Klf). (2.2) 

We remark that the assumption that the matrix entries have the special form (2.1a) or (2.1b) is not 
necessary for our results to hold. We make it here because it greatly simplifies our proof. The reason for 
this is that, as observed by Feldhcim and Sodin [26,42], the condition (2.2) allows one to obtain a simple 
algebraic expression for the nonbacktracking powers of H; see Lemma 5.2. 

In the forthcoming paper [16] we extend our results to random matrix ensembles in which the matrix 
elements H xy are allowed to have a general distribution (and thus in particular a genuinely random absolute 
value); moreover their variances El^j,! 2 are given by a general profile on the scale W in x — y (as opposed 
to the step function profile in (2.2)). Under these assumptions, the algebraic identity of Lemma 5.2 is no 
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longer exact, and needs to be amended with additional random terms. The resulting graphical expansion is 
considerably more involved than in the case (2.2), and its control requires essential new ideas. However, the 
fundamental mechanism underlying quantum diffusion for band matrices is already apparent in the special 
case (2.2) discussed in this paper. 

Let a e 21 := {1, . . . , |Ajv|} index the orthonormal basis {^} a6 a of eigenvectors of the matrix H", i.e. 
H^ipa = A^-0qj where € R. The normalization of the matrix elements is chosen in such a way that the 
typical eigenvalue of the matrix is of order one: 

1 NT^„.o 1 „o 1 _*-^,„ |2 M 



a m m ft v m — i 



3. Scaling and results 



The central quantity of our analysis is 

g(t,x) := EK^.e-"^)! 2 , 

where S x £ £ 2 (An) denotes the standard basis vector, defined by (6 x ) y = 5 xy . The factor 1/2 is a convenient 
normalization since, by a standard result of random matrix theory, the spectrum of H/2 is asymptotically 
equal to the unit interval [—1,1]. The function g(t,x) describes the ensemble average of the quantum 
transition probability of a particle starting from position ending up at position x after time t. Note that 
J2 X x ) = ^ f° r an y t € M. Heuristically, the particle performs a series of random jumps of size W. The 
typical number of jumps in time t = 0(1) is of order one. Indeed, by first order perturbation theory, the 
small-times probability distribution for 1 ^ \x\ ^ W is given by 

g(t,x) ~ E\{S x ,(l-itH/2)S Q )\ 2 = jE\H xQ \ 2 = ^ — ^ , 

up to higher order terms in t. Thus J2 x ^o &(t> x ) 1S an 0(1) quantity, separated away from zero, indicating 
that the distance from the origin is of 0(W) for times t ~ 0(1). 

In time t the particle performs 0(t) jumps of size 0(W). We expect that the jumps are approximately 
independent and the trajectory is a random walk consisting of 0(t) steps with size 0(W) each. Thus, the 
typical distance from the origin is of order ^' 2 W '. We rescale time and space (t, x) ^ (T,X) so as to make 
the macroscopic quantities T and X of order one, i.e. we set 

t = r]T, x = r]^ 2 WX, 

where W and r\ are two large parameters. Ideally, one would like to study the long time limit r\ — > co for 
a fixed W. In this case, however, we know that the dynamics cannot be diffusive for d = 1. Indeed, as 
explained in the introduction, it is expected that the motion cannot be diffusive for distances larger than 
W 2 : this has in fact been proved [40] for distances larger than W 8 . Thus we have to consider a scaling limit 
where rj and W are related and they tend simultaneously to infinity. To that end we choose an exponent 
k > and set rj = r](W) := W dK . 

Our first main result establishes that g(t, x) behaves diffusively up to time scales t = 0(W dK ) if k < 1/3. 
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Theorem 3.1 (Quantum diffusion). Let < k < 1/3 be fixed. Then for any T > and any continuous 
bounded function tp € Cb(R d ) we have 

uniformly in N > IT 1 ^/ 6 and ^ T ^ T . Fere 

f 1 4 A 2 
L(T,X) := / dA-— =G(XT,X), 

JO 7T VI - A 2 

and G is the heat kernel 

G(T,X) := (^^V- 1 * 12 , (3.2) 

Remark 3.2. The factor d + 2 arises from a random walk in d dimensions with steps in the unit ball. If 
B is a random variable uniformly distributed in the d-dimensional unit ball, the covariance matrix of B is 
(d + 2)- 1 !. 

This result can be interpreted as follows. The limiting dynamics at macroscopic time T is not given by a 
single heat kernel, but by a weighted superposition of heat kernels at times AT, for < A ^ 1. The factor A 
expresses a delay arising from backtracking paths, in which the quantum particle "wastes time" by retracing 
its steps. If the particle is not backtracking, it is moving according to diffusive dynamics. The backtracking 
paths correspond to two-legged subdiagrams, and have the interpretation of a self-energy renormalization in 
the language of diagrammatic perturbation theory. Thus, out of the total macroscopic time T during which 
the particle moves, a fraction A of T is spent moving diffusively, and a fraction (1 — A) of T backtracking. 
Theorem 3.1 gives an explicit expression for the probability density /(A) = ^ ^tj^i 1(0 < A < 1) for the 
particle to move during a fraction A of T. 

Our proof precisely exhibits this phenomenon. As explained in Section 4, the proof is based on an 
expansion of the quantum time evolution in terms of nonbacktracking paths. At time t = W dK T, this 
expansion yields a weighted superposition of paths of lengths n = 1, . . . , [t] (higher values of n are strongly 
suppressed). Here n is the number of nonbacktracking steps, i.e. the number of steps that contribute to 
the effective motion of the particle. The difference [t] — n is the number of steps that the particle spends 
backtracking. Our expansion (or, more precisely, its leading order ladder terms) shows that the weight 
of a path of n nonbacktracking steps is given by |a„(i)| 2 , where a n (t) is the Chebyshev transform of the 
propagator e _It ^ in £; see (5.3). The probability density / arises from this microscopic picture by setting 
n = [At]. Then we have, as proved in Proposition 8.5 below, t|a[At](i)| 2 — > /(A) weakly as t — > oo. 

Our second main result shows that the eigenvectors of H have a typical localization length larger than 
W 1+dK / 2 , for any k < 1/3. For x e An and <!>0we define the characteristic function P x j projecting onto 
the complement of an I- neighbourhood of x, 

P Xt i{y) := l{\y-x\>l). 

Let e > and define the random subset 21" ^ C 21 of eigenvectors through 

The set 2t" £ contains, in particular, all eigenvectors that are exponentially localized in balls of radius 0(£); 
see Corollary 3.4 below for a more general and precise statement. 
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Theorem 3.3 (Delocalization). Let s > and < k < 1/3. Tften 

limsupEJ — < 2V£, 

uniformly in N > Vt /1+d/ ' 6 . 

Theorem 3.3 implies that the fraction of eigenvectors subexponentially localized on scales W 1+Kd ^ 2 con- 
verges to zero in probability. 



Corollary 3.4. For fixed 7 > and K > define the random subset of eigenvectors 
03? := I a e a : 3« e Ajy : ^|V£(*)| 2 exp \ K^. 

Then for < K < 1/3 we /iawe 



(3.3) 



lim E 1 ^™ 1 



uniformly in N > W 1+d ^ . 



W^oo |2l| 



4. Main ideas of the proof 



We need to compute the expectation of the squared matrix elements of the unitary time evolution e~ ltH / 2 . 
A natural starting point is the power series expansion e~ ltH / 2 — 'J2 n > (—itH/2) n /nl. Unfortunately the 
resulting series is unstable for t — > 00, as is manifested by the large cancellations in the sum 

This can be seen as follows. The expectation 

EH" y H" x = E ^2 ^ H XXl H XlX2 . . . H Xn _ iy H y y n ,_ i . . . H yiX (4.2) 

xi,...x„_i yi,---v„'-i 

is traditionally represented graphically by drawing the labels x, X\,X2, ■ ■ ■ , j/i, x as vertices of a path, and by 
identifying vertices whose labels are identical. Since the matrix elements are centred (i.e. EH xy = for all 
x,y), each edge must be traveled at least twice in any path that yields a nonzero contribution to (4.2). It 
is well known that the leading order contribution to (4.2) is given by the so-called fully backtracking paths. 
A fully backtracking path is a path generated by successively applying the transformation a aba to the 
trivial path x. A typical fully backtracking path may be thought of as a tree with double edges. It is not hard 
to see that, after summing over y, each fully backtracking path yields a contribution of order 1 to (4.2). Also, 
the number of fully backtracking paths is of order 4" +n , so that the expectation (4.2) is of order 4" + " . In 
particular, this implies that the main contribution to (4.1) comes from terms satisfying n + n' <~ t. Moreover, 
the series (4.1) is unstable in the sense that the sum of the absolute values of its summands behaves like e 4t 
as t — > 00. 
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The large terms in (4.1) systematically cancel each other out similarly to the two-legged subdiagram renor- 
malization in perturbative field theory. In perturbative renormalization, these cancellations are exploited by 
introducing appropriately adjusted fictitious counter-terms. In the current problem, however, we make use 
of the Chebyshev transformation, which removes the contribution of all backtracking paths in one step. The 
key observation is that, if U n denotes the n-th Chebyshev polynomial of the second kind, then U n (H/2) can 
be expressed in terms of nonbacktracking paths. A nonbacktracking path is a path which contains no subpath 
of the form aba. Thus the strongest instabilities in (4.1) can be removed if e~ ltH / 2 is expanded into a series 
of Chebyshev polynomials. This idea appeared first in [7] and has recently been exploited in [26,42] to prove, 
among other things, the edge-universality for band matrices. In [42] it is also stated that the same method 
can be used to prove derealization of the edge eigenvectors if W > iV 5 / 6 , i.e. to get the bound £ > W 6 ^ 5 on 
the localization length I. Our estimate gives a slightly weaker bound, I ^ W 7 ^ 6 , for this special case, but it 
applies to bulk eigenvectors as well as higher dimensions. 

After the Chebyshev transform, we need to compute expectations 

E y ' y ' H XXl H XlX2 . . . H Xn l yHyy n , _ ± . . . Hy lX , (4-3) 

x 1 ,...x n - 1 y 1 ,...y n i_ 1 

where the summations are restricted to nonbacktracking paths. As above, since EH a b = every matrix 
element must appear at least twice in the non-trivial terms of (4.3). Taking the expectation effectively 
introduces a pairing, or more generally a lumping, of the factors, which can be conveniently represented 
by Feynman diagrams. The main contribution comes from the so-called ladder diagrams, corresponding to 
n = n! and Xi = y; t . The contribution of these diagrams can be explicitly computed, and showed to behave 
diffusively. More precisely: Since we express nonbacktracking powers of H as Chebyshev polynomials in H/2 } 
the contribution of each graph to the propagator e - ltH / 2 carries a weight equal to the Chebyshev transform 
a n {t) of e -1 *^ in £. We shall show that a n (t) is given essentially by a Bessel function of the first kind. In 
order to identify the limiting behaviour of the ladder diagrams, we therefore need to analyse a probability 
distribution on N of the form {l Q! n( i )| 2 }„ eN f° r large t (Section 8). 

The main work consists of proving that the non-ladder diagrams are negligible. Similarly to the basic 
idea of [18-20], the non-ladder diagrams are classified according to their combinatorial complexity. The large 
number of complex diagrams is offset by their small value, expressed in terms of powers of W. Conversely, 
diagrams containing large pieces of ladder subdiagrams have a relatively large contribution but their number 
is small. 

More precisely, focusing only on the pairing diagrams in the Hermitian case, it is easy to see that ladder 
subdiagrams are marginal for power counting. We define the skeleton of a graph by collapsing parallel ladder 
rungs (called bridges) into a single rung. We show that the value of a skeleton diagram is given by a negative 
power of M ~ CW d that is proportional to the size of the skeleton diagram. This is how the dimension 
d enters our estimate. We then sum up all possible ladder subdiagrams corresponding to a given skeleton. 
Although the ladder subdiagrams do not yield additional H^-powers, they represent classical random walks 
for which dispersive bounds are available, rendering them summable. The restriction t <C W d ^ comes 
from summing up the skeleton diagrams. In Section 11 we present a critical skeleton that shows that this 
restriction is necessary without further resummation or a more refined classification of complex graphs. 



5. The path expansion 

We start by writing the expansion of e~ ltH / 2 in terms of nonbacktracking paths by using the Chebyshev 
transform. 
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5.1. The Chebyshev transform of c lt *. The Chebyshev transform a&(£) of e is defined by 

oo 

e" it£ = 5> fe (*)t4(0- 

fc=0 

Here denotes the Chebyshev polynomial of the second kind, defined through 

U k {cos9) = — (5.1) 

sm t) 

for k = 0, 1, 2, The Chebyshev polynomials satisfy the orthogonality relation 

- f d€Vi-?u k {t)Ut(0 = 6 kl . 
T 7-1 

Therefore the coefficients a k (t) are given by 

a k (t) = - [ dZy/l^Fe-^UkiO. (5.2) 

T 7-1 

The coefficient a k (t) can be evaluated explicitly using the standard identities (see [32]) 
U k (0 = Wfl , T fc+2 (0-2£T fc+1 (0+T fc (0 = 0, 



2 



j-i \/i - £ 2 77 J-i \/i - £ 2 



\ i i J " " W-i \ i ; 

J fe (t) + J fe+2 (i) = V - - Jfc+ift)- 

Here T fc denotes the Chebyshev polynomial of the first kind and J k the Bessel function of the first kind; they 
are defined through 

T k (cos9) := cos(fcfl), J k (t) := - / d6» cos(t sin6> - t9) . 

n Jo 

If k = 21 is even we may therefore compute 



a 2l (t) = 2 / d£ v/1-? 2 cos(^)^ fc (0 = 2 / d£ Vl-? 2 

TT 7-1 7-1 



cos(i£) 



gr fc+1 (Q-r fc+2 (Q 
i-e 2 



1 l^oos^m^K) = (-l)'[J M (t) + .&,+,(*)] = 2(-l)' 2 i±lj 2(+1 (t). 



If fc = 2Z + 1 is odd a similar calculation yields 



2Z + 2 



a 2 j+i(t) = -2i(-l) ! — — J 2 ; +2 (i). 
Thus we have the following result. 
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Lemma 5.1. We have that 

e- i4 « = J2a k (t)U k (0, 
k 

where 

a k (t) = 2 (-i) k ^±±J k+1 (t). (5.3) 
Also, for all t e R we have the identity 

£M*)I 2 = 1, (5-4) 

as follows from the orthonormality of the Chebyshev polynomials. 

5.2. Expansion in terms of nonbacktracking paths. For n = 0, 1, 2, . . . let _H"(") denote the n-th nonback- 
tracking power of H. It is defined by 

rr(n) \ v ' rr rr 

^XOjXn - — / y "loHl"' 17 !--!^! 

Xlj...,X n _l 

where means sum under the restriction 7^ x i+2 for i = 0, . . . , n — 2. We call this restriction the 
nonbacktracking condition. 

The following key observation is due to Bai and Yin [7]. 

Lemma 5.2. The nonbacktracking powers of H satisfy 

ff(°> = 1, HV>=H, H^=H 2 -^—l, 

M - 1 

as we/i as i/ie recursion relation 

H {n) = HH (n-l) _ H (n-2) ( n ^ 3) . (5.5) 

Proof. For the convenience of the reader we give the simple proof. The cases n = 0, 1, 2 are easily checked. 
Moreover, 

71-2 

x±,...,x n -i i—1 
n-2 

= ^2 \\_^{xi ^ Xi +2 ) H XoXl ■ ■ ■ H Xn _ lXn 

xi,...,x n -i i=0 

n-2 

+ J! =X2 ) II 1 ( ;E * ^ X i+l)H XaXx ■■■H Xn _ lXn 

xi,...,x n -i i—1 

n-2 

= {H^ n) ) XOiXn + ^2 l(x ^ Xi) J| l(xi ^ x i+2 )H XoX3 H X3X4 ■ ■ -H Xn _ lXn 

X3,...,x n -i i=3 

x H x i ^ x 3 )\H XoXl \ 2 

Xl 

Notice that in the last step we used (2.2). □ 
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Feldhcim and Sodin have observed [26, 42] that (5.5) is reminiscent of the recursion relation for the 
Chebyshev polynomials of the second kind. Let us abbreviate U n (Q '■= U n {£/2). Then we have (see 

u (o = i, cmo = e, u 2 (o = e-i, 

and for n ^ 2 

C/„(0 = ^„-l(0-£n-2(0- 

Comparing this to Lemma 5.2, we get, following [26,42], 

hW = tr„(ff)-^L_^ n _ 2 (ff). 

Solving for U n (H) yields 

fe>0 v ; 

with the convention that H^> = for n < 0. Therefore Lemma 5.1 yields 



n>0 m^O fe>0 



e-^ 2 = J2^(t)U n (H) = 



(M - l) fe ' 



We have proved the following result. 
Lemma 5.3. We have that 

e -HH/2 = J2a m (t)H^\ 

where 



, . ._ a m +2k(t) 



6. Graphical representation 

For ease of presentation, we assume throughout the proof of Theorem 3.1 (Sections 6-8) that we are in 
the Hermitian case (2.1a). How to extend our arguments to cover the symmetric case (2.1b) is described in 
Section 9. 

Using Lemma 5.3 we get 

g(t,x) = J2 a n (t)^jEH^H^'K 

Expanding in nonbacktracking paths yields a graphical expansion. Let us write H^H^ •* as a sum over 
paths xq, Xi,..., x n + n >-i, Xo, where xo = and x n — x. Such a path is graphically represented as a loop of 
n + n' vertices belonging to the set V n , n ' '■= {0, . . . , n + n' — 1}; see Figure 6.1. Vertices i £ V n , n > satisfying 
the nonbacktracking condition (i.e. Xi-\ ^ x i+1 ) are drawn using black dots; other vertices are drawn using 
white dots. There are n + n' oriented edges eg, ■■ . , e„+„'_i defined by a '■= + 1) (here, and in the 
following, V„.„' is taken to be periodic). We denote by £„,„' := {e , . . . , e n+n >_i} the set of edges. In Figure 
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Figure 6.1: The graphical representation of paths of vertices. 



6.1 the edges are oriented clockwise. Each vertex has an outgoing and an incoming edge, and each edge e 
has an initial vertex a(e) and final vertex 6(e). Moreover, we order the edges using their initial vertices. 

Each vertex i e V n ,n' carries a label Xi <G A N . The labels x = (x , . . . , x n+n i_i) are summed over under 
the restriction Q x (x) = 1, where 

n+n' — 1 n—2 n+n' — 2 

Xi+2) ■ 

i—0 i—0 i—n 

The two last products implement the nonbacktracking condition. We define the unordered pair of labels 
corresponding to the edge e through 

&c(e) == {x a ( e ),Xb(e)} ■ 

Next, to each configuration of labels x = (x , . . . , x n+n >-i) we assign a lumping T — T(x) of the set of 
edges £„,„<. Here a lumping means a partition of £ n , n i or, equivalently, an equivalence relation on We 
use the notation T = {7} 7 er, where 7 G T is lump of T, i.e. an equivalence class. The lumping T — T(x) 
associated with the labels xjs defined according to the rule that e and e' are in the same lump 7 e T if and 
only if f) x (e) = £» x (e')- Let & n ,n' denote the set of lumpings of £ n , n ' obtained in this manner. Thus we may 
write 

= E ^(H- 

Here 

T4(r) = Q x ( X )EH XoXl ---H Xn+n ,_ iXo , 

X 

where the summation is restricted to label configurations yielding the lumping F. 

Next, observe that the expectation of a monomial f\ y z {H yz Y y * is nonzero if and only if u yz = v zy for all 
y, z (here we only use that the law of the matrix entries is invariant under rotations of the complex plane). 
In particular, V X (T) vanishes if one lump 7 e T is of odd size. Defining the subset & n ,n' C ^ n ,n' of lumpings 
whose lumps are of even size, we find that 

= E w)- 

We summarize the key properties of & n ,n'- 
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Lemma 6.1. Let T e ^ n ,n'- Then each lump 7 <E T is of even size. Moreover, any two edges e, e' e 7 m the 
same lump 7 are separated by either at least two edges or a vertex in {0,n} (nonbacktracking property). 

Next, we give an explicit expression for V^(r). We start by assigning to each lump 7 e T an unordered 
pair of labels p 7 . Then we pick a partition 7r 7 of 7 into two subsets of equal size. Abbreviate these families 
as g r = {f? 7 } 7 er and nr = {7r 7 } 7G r- Thus we get 

V X (T) = E^WEE( II A -(^.^)) ( II H<h*(h>))®Hxoxi---H Xn+n ,_ lXo . (6.1) 
x g r 7r r \yer / / 

Here, for each 7 e T, g 1 ranges over all unordered pairs of labels and 7r 7 ranges over all partitions of 7 into 
two subsets of equal size; A x (p 7 ,7r 7 ) is the indicator function of the following event: For all e e 7 we have 
that £> x (e) = Qj, and 

e, e e 7 belong to the same subset of 

e, e' e 7 belong to different subsets of 7r 7 £ a (e) = a;f>( e ') > x 6(e) = a;o(e') • 

This definition of A x (g 7 ,7r 7 ) has the following interpretation. All edges in 7 (corresponding to matrix 
elements) have the same unordered pair of labels (and hence represent copies of the same random variable 
H yz or its complex conjugate). Moreover, each random variable H yz must appear as many times as its 
complex conjugate; random variables indexed by two edges e, e' € 7 are identical if e, e' belong to the same 
subset of 7r 7 , and each other's complex conjugates if e, e' belong to different subsets of 7r 7 . 
Note that the expectation in (6.1) is equal to 

(6.2) 



(M - 1)" 

where n := In particular, V X (T) ^ 0. 



Figure 6.2: A pairing of edges. 

An important subset of lumpings of is the set of pairings, S^ n .n' C c S n .n l , which contains all lumpings 
T satisfying |— y | = 2 for all 7 € T. We call two-element lumps cr e bridges. Given a pairing T e ^n,n') 

we say that e and e' are bridged (in T) if there is a cr e T such that cr = {e, e'}. Bridges are represented 
graphically by drawing a line, for each {e, e'} G T, from the edge e to e'; see Figure 6.2. Thus a pairing 
T e ^ n , n ' is the edge set of a graph whose vertex set is E n , n * . If T is a pairing, each bridge a € T has a 
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unique partition ir a of its edges, so that the expression (6.1) for V X (T) may be rewritten in the simpler form 

^(O = ^Qx(x)l II 1 ( X a(e)=Xb(e')) 1 ( X b(e)=X a (e'))\ 
x \{e,e'}er / 

>< ( n n n ^oo * m^w • (6 - 3) 

The main contribution to the expansion is given by the ladder pairing L n <E ^ n ,n- It is defined as 

L n ■= {{eo,e2„-i},{ei,e2„-2},---,{e n -i,e„}}. 
The ladder is represented graphically in Figure 6.3. 

ei 



C2n-1 




e n -i 



e2n-2 



Figure 6.3: The ladder pairing. 



7. The non-ladder lumpings 

In this section we estimate the contribution of the non-ladder lumpings and show that it vanishes in the 
limit W — > oo. Let ^* n , C @ n ,n' denote the set of non-ladder lumpings, i.e. S?* n ; := & n , n ' ii n ^ n' and 

n ^n,n \ {L n }- Similarly, let g?* n n , := !?n,ri n n< denote the set of non-ladder pairings. 

We shall prove the following result. 

PROPOSITION 7.1. Let < n < 1/3 and picA; a /3 satisfying < f3 < 2/3 - 2k. T/ien t/iere is a constant C 
such that 

E E M^M^i E < ww> 

for W larger than some W (T, k) and N > W 1+d/6 . 

The rest of this section is devoted to the proof of Proposition 7.1. 
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7.1. Controlling the non-pairings. Replacing the expectation in (6.1) with (6.2) we get 

x e r n r \ 7 er / \ 7 ^ 7 ' / v ' 

We start by estimating the sum over all lumpings T e * n , in terms of a sum over all pairings T € ^* n , . 
Let us define 

RAT) := fa M*.^)) 7^^^- (7.1) 

x Br T \7er / ^ ' 

Lemma 7.2. For n,n' e N we /i<roe 

Proof. Let g> 7 and 7r 7 be given for each 7 € L. For each 7, pick any pairing £ 7 of 7 that is compatible with 
7r 7 in the sense that, for each bridge a <E S 7 , the two edges of <r belong to different subsets of 7r 7 . If n = n', 
we additionally require that not all S 7 's are subsets of the Ladder L n (such a choice is always possible). 
Next, set Q a — g y for all a € S 7 . Note that each bridge <r carries a unique partition 7r CT . It is then easy to 
see that for any pairing S 7 as above, we have 

A x (p 7 ,7r 7 ) < Y[ A x (g ff ,7r CT ) . 

<t£S t 

Thus, by partitioning each 7 e F into bridges, we see that each term in J2re^' , i s bounded by a 

corresponding term in J2reS"* Rx(T). In fact, there is an overcounting arising from the different ways of 
partitioning 7 into bridges. □ 

Because of Lemma 7.2 we may restrict ourselves to pairings. We estimate X^re^* Rx(T)- If F is a 
pairing we may write, just like (6.3), the expression (7.1) in the simpler form 

R x( T ) = X^Wf II l(^a(e) = X h{e ,))\{x b{e) = tt a(e ,)) J ^ . (7.2) 

x \{e,e'}er / ^ ' 



7.2. Collapsing of parallel bridges. Let us introduce the set S?* nn t-, defined as the set of all non-ladder 
pairings of . Clearly, &* n n , is a proper subset of S?* n n , (due to the nonbacktracking condition of 
Lemma 6.1 which is imposed on pairings in ^* n ,). 

Let n, n' ^ and T € 3?* nn i- For any we say that the two bridges {ei,ej} and {ej+i, ej_i} of 
T are parallel if i + 1, j ^ {0,n}; see Figure 7.1. Two parallel bridges may be collapsed to obtain a new 
pairing V of a smaller set of edges, in which the parallel bridges are replaced by a single bridge. More 
precisely: We obtain V <G 8P* m m , from T <G 3^* n n , by removing the vertices i + 1 and j, by creating the 
edges (i, i + 2) and (j — l,j + 1), and by bridging them. Finally, we rename the vertices using the increasing 
integers 0, 1, 2, . . . , n + n' — 3; by definition, the new name of the vertex n is m, and m' is defined through 
m + m' + 2 = n + n / . The converse operation of collapsing bridges, expanding bridges, is self-explanatory. 

In the next lemma we iterate the above procedure r^L until all parallel bridges have been collapsed. 
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i i+1 i + 2 




.7 + 1 



Figure 7.1: Two parallel bridges. 

Lemma 7.3. Let T <G S?* n n ,. Then there exist m sC n, m! ^ n', and a pairing S(T) € S?^, m , containing no 
parallel bridges, such that T may be obtained from S(T) by successively expanding bridges. This defines S(T) 
uniquely. 

Proof. Successively collapse all parallel bridges in T; see Figure 7.2. The result is clearly independent of 
the order in which this is done. □ 





S(T) 



Figure 7.2: Collapsing parallel bridges to obtain the skeleton pairing. 



by 



We call the pairing £ = S(T) the skeleton of T. The set of skeleton pairings of the edges £ m ,m< is denoted 



Note that m , is in general not a subset of m , . The following lemma summarizes the key properties 

of 9'* , 

ui • Jr m,m'- 

Lemma 7.4. (i) Each E e ^mm' contains no parallel bridges, 
(ii) Let £ e m , and <r = {e, e'} € £. T/ien e, e' are adjacent only if e fl e' € {0, m}. 
fmj //m := =i±!^ = 1 then ^ m , = 0. 

Proof. Statement (i) follows immediately from the definition of S(T). Statement (ii) is a consequence of 
the nonbacktracking property of pairings in ^* n ,, i.e. Lemma 6.1. To see this, let £ G -^mm' be of the 
form £ = 5(r) for some T e ^* n ,. If E = S(T) contains a bridge {e, e'} consisting of two consecutive edges 



1G 



e, e', then T must also contain a bridge {/, /'} consisting of two consecutive edges /, /'. If e n e' £ {0, to}, 
then /n /' ^ {0,n}, in contradiction to Lemma 6.1. Statement (hi) is an immediate consequence of (ii) and 
the requirement that L\ ^ □ 



7.3. Contribution of parallel bridges. For given n and n', we estimate X^re^* ^(r) by summing over 
skeleton pairings E, followed by summing over all possible ways of expanding the bridges of S. 

We observe that a pairing T e 3P* n , is uniquely determined by its skeleton E = S(T) e , for 

some positive integers m,m! as well as a family = {^<j}cres satisfying = n, where i G encodes the 
number of parallel bridges that were collapsed to form the bridge a. Here i a ^ 1 is a positive integer and 
\Iy\ '■= Xcrgs ^<j- Let Gg s (E) denote the pairing obtained from E by expanding the bridge <r into £ CT parallel 
bridges, for each a G E. Thus T may be recovered from its skeleton through T = G£ S (E) for a unique family 
£j> For given peN, the sum over all pairings T satisfying |r| = p therefore becomes 

E E R *( T ) = E E E (7.3) 

n+n'=2p re#* n , m+m'<2p m , £ s : \£ s \=p 

Next, we define and estimate the contribution to R X (T) of a set of I parallel bridges. Let i > 1, and two 
labels y, z be given. Then we define 

£-1 

D e(y,z) '■= E ^oK^ri 1(1 < -Zi+il < W"). 

Thus, D((y,z) is equal to the number of paths of length £ from y to z, whereby each step takes values in 
{x : 1 ^ \x\ ^ IF}. (Wc could also have included the nonbacktracking restriction in the definition of D?, 
but this is not needed as we only want an upper bound on i£ x (E)). Graphically, Dg corresponds to the 
contribution of I parallel bridges; see Figure 7.3. 

Xt-i z 




y xi xe-i z 



Figure 7.3: Summing up I parallel bridges. 

We need the following straightforward properties of Dt. 
Lemma 7.5. Let leN. Then for each y we have 

z 
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Moreover, for each y and z we have 

Dt(y,z) < M 1 - 1 



as well as 

D t {y,z) < ^M'-i + j^M*. 
for some constant C . 

Proof. The first two statements are obvious. The last follows from a standard local central limit theorem; 
see for instance the proof in [42] . □ 

7.4. Orbits of vertices. Fix T £ ^nn'- We observe that the product in (7.2) may be interpreted as an 
indicator function that fixes labels along paths of vertices. To this end, we define a map t = rp on the 
vertex set V n , n '- Start with a vertex i £ V n , n '- Let e be the outgoing edge of i (i.e. e = 1)), and e' the 

edge bridged by T to e. Then we define ri as the final vertex of e' (i.e. e' — (ri — l,ri)). Thus the product 
in (7.2) may be rewritten as 

JJ l(ar ( e ) = x 6 ( e /))l(a; h ( e ) = x a(e ,)) = ^x Ti ■ 

{e,e'}er i£V„ „/ 

Starting from any vertex i g V n , n > we construct a path (i, ri, r 2 i, . . .). In this fashion the set of vertices is 
partitioned into orbits of r; see Figure 7.4. Let [i] C V n , n ' denote the orbit of the vertex i £ V n , n ' ■ 




n 



Figure 7.4: Construction of the orbit [i] of the vertex i. 

Next, let E = S(T) £ ,5?'^ m , be the skeleton pairing of T, and let the family be defined through 
T = G^ s (£). The map t = rj on the skeleton pairing E is defined exactly as for T above. In order to sum 
over all labels x = (x 0} ■ ■ ■ , x n+n i-i) in the expression for R x (Ge s (£)), we split the set of labels x into two 
parts: labels of vertices between two parallel bridges, and labels associated with vertices of S. In order to 
make this precise, we need the following definitions. 

Let Z(E) be the set of orbits of E. It contains the distinguished orbits [0] and [m], which receive the 
labels and x respectively. (Note that we may have [0] = [to], in which case x must be 0.) We assign a 
label y^ to each orbit Q £ Z(S), and define the family y E := {yc}(ez(s)- Each bridge a £ E "sits between 
two orbits" Ci(< 7 ) an d C>i(p) ■ More precisely, let e = (i, i + 1) £ a be the smaller edge of a. Then wc 
set Ci( CT ) := [i] an d C2(c) : = [i + !]• (Note that using the larger edge of a in this definition would simply 
exchange and er 2 (er); this is of no consequence for the following.) 
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Lemma 7.6. For given E £ ^e, T = G^iY,) £ ^ „/ we /iave 

^x(F) < * ^ 1(0 = y [0] )l(x = y [m] ) J] £Uy Cl(g) , y C2(g) ) ■ (7.4) 

Proof. The left-hand side of (7.4) is given by the expression (7.2). The summation over all x^s between 
parallel bridges of T is contained in the factors De, and the summation over all the remaining Xi's is replaced 
by the sum over y s . We relaxed the nonbacktracking condition in Q x (xl) to obtain an upper bound. □ 

Next, let := Z(E) \ {[0]} and define L(E) := |Z*(E)|. The set Z*(S) is the set of orbits whose 

label is summed over in ^2 X R X (T). The following lemma gives an upper bound on L(E). It states, roughly, 
that the number of orbits (or free labels) is bounded by 2m/3; we refer to it as the 2/3 rule. Compare this 
bound with the trivial bound L(H) < m, which would be sharp if E were allowed to have parallel bridges. 

Lemma 7.7 (The 2/3 rule). Let E £ y;^,. Then i(E) + i. 

PROOF. Let Z'(E) := Z(E) \ {[0], [m]}. We show that every orbit C £ -Z'(E) consists of at least 3 vertices. 
Let i £ V m ,m' belong to C £ i?'(E). Then, by Lemma 7.4 (ii), we have that ti ^ i. By assumption, 
ri ^ {0,m}. Hence T 2 i 7^ i, for otherwise E would have two parallel bridges, in contradiction to Lemma 
7.4 (i). Therefore the orbit of r contains at least 3 vertices. Note that there are orbits containing exactly 3 
vertices, as depicted in Figure 7.4. 

The total number of vertices of E not including the vertices and m is 2m — 2, so that we get 

3|Z'(E)| < 2m- 2. 

The claim follows from the bound \Z*(E)\ < \Z'(T,)\ + 1. □ 



7.5. Bound on R X (T). As in the previous subsection, we fix F £ n ,, E = S(T) £ ,5?'^ m ,, and £■£ satisfying 

r = G fe (E). 

We start by observing that the product in (7.4) may be rewritten in terms of a multigraph 11(E) on the 
vertex set Z(H). Each factor Di t7 {y^ 1 ^,y^ a ^) yields an edge connecting the orbits £1 and £2- In other 
words, there is a one-to-one map, which we denote by </>, between bridges of E and edges of n(E); each 
bridge a £ E gives rise to an edge <p(a) of 11(E) connecting (i(cr) and (2(0) ■ See Figure 7.5 for an example 
of such a multigraph. 

Lemma 7.8. There is a subset of bridges E T C E of size |E T | = £(E), such that, in the subgraph of 11(E) 
with the edge set </>(Et), each orbit ( £ Z*(E) is connected to [0]. 

Proof. Starting from Co = [0], we construct a sequence of orbits Co, Ci> • • • > Cl(s), an d a sequence of bridges 
(Ti, . . . , with the property that for all k = 1, . . . , L(E) there is a k' < k such that ( k and Cfc' are 

connected by </>(<7fc). 

Assume that Co> . . . , Cfc-i have already been constructed. Let i be the smallest vertex of V m ,m' \ (Co U • • • U 
Cfe-i)- Then we set Cfc = [i]- By construction, the vertex i — 1 belongs to an orbit Cfe' f° r some fc' < k. Set 
(Tfe to be the bridge containing {i — 1, i}. Hence, by definition of n(E), we see that Cfc and Cfc' are connected 
by (j){u k ). 

The set E T is given by {a\, . . . , <tl(e)}- D 
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Figure 7.5: Left: a skeleton pairing S giving rise to 5 orbits indexed by Z(E) = {1, . . . , 5}; the bridges in St, for one 
possible choice of Et, are drawn using thick lines. Right: the corresponding multigraph on the vertex set Z(S); the 
edges in 0(Et) are drawn using thick lines. 



Because |Et| = L(Y,), the subgraph of 11(E) with the edge set ^(Sr) is a tree that connects all orbits in 
Z*(S) to [0]. Let us call this tree T(E). Its root is [0]. 
Next, we observe that 

|E\E T | > 1. (7.5) 

Indeed, using Lemma 7.7 and fh > 2 we find 

|E\E T | = m-L(S) > |-I > I. (7.6) 

We now estimate (7.4) as follows. Each factor indexed by a € E \ Et is estimated by sup y z .D^ (y, z). 
As it turns out, we need to exploit the heat kernel decay for at least one bridge in E \ Et. Pick a bridge 
a e E \ St (By (7.5) there is such a bridge). Using Lemma 7.5, we estimate 

supL> c (j/,z) < M^- 1 if cr e E \ (S T U {ct}) , (7.7a) 

sup^(y,z) sC J^m'*" 1 + -^M<* ifa = a. (7.7b) 

Since N ^ IUM 1 / 6 and M ~ CW d we find 



M^" 1 + — -j Af^* < CM 1 -'" 1 -— + 



jV<* ^ I -,1/2 ^1/6 

where we replaced (i with 1 to obtain an upper bound. Thus we get 

We perform the summation over y s by starting at the leaves of T(E) and moving towards the root [0]. Each 
vertex ( of T(E) carries a label y^. Let us choose a leaf ( of T(E), and denote by £' the parent of £ in T(E). 
Let cr e E be the (unique) bridge such that <fi(cr) connects ( and (' . Then summation over yields the 
factor 

Y, D 'MM)=M i ~, (7.8) 
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by Lemma 7.5. Continuing in this manner until we reach the root, we find 



E*-< r >« (mM^+m^) n *-* n«' 



= C* 



o-es\s T creSi 
M W 1 1 \ 1 



Now (7.6) implies 



so that 



M — 1 J V pi/* M 1 / 6 / Ml s \ s H 



E*.(T) < C[—\ + (7.9) 



m — i J W 1/2 Mve y M™/3 • 

Notice that (7.9) results from an ^-f 00 -summation procedure, where the £ 1 -bound (7.8) was used for 
propagators associated with bridges in Ey, and the ^°°-bound (7.7) for propagators associated with bridges 
in E \ Ey. The bound (7.7a) is a simple power counting bound; the bound (7.7b), improved by the heat 
kernel decay, is used only for one bridge. Note that in the original setup (2.1) each row and column of H 
contains M nonzero entries H xyi whose positions are determined by the condition 1 |x — y\ ^ W. If 
we removed this last condition and only required that each row and column contain M nonzero entries in 
arbitrary locations off the diagonal, then all bounds relying solely on power counting would remain valid. In 
particular, (7.9) would be valid without the factor ig 1 ^ 2 + M^ 1 / 6 , which results from the heat kernel decay 
associated with the special band structure. 

7.6. Sum over pairings. We may now estimate ^2 n+n , =2p Sre^* S x -^(T) for fixedp. Let first p, to, to' > 
and E e JC,m' ■ Thcn ( 7 - 9 ) y ields 

E E^( G ^( E )) < c (w L \) E (jj2 + j^ 

The sum on the right-hand side is equal to 

1 1 \ P-™+! / 



E ( ,1/2 + M l/6 ) E ( ,1/2 + M l/6 ) E 

1 £l— 1 1 

±(4, 



l+-+lfh=P Xt, l 7 <1=1 Vt, l 7 <2 + -+<«k=p- 

P 

■ 



I A/2 M l/6 

1 = 1 V *l 

1 1 \ P 



P - 1 1 - 1 

to — 2 



pi/2 MVey (m-2) 
Next, we note that 

\S^ m ,\ < (2m-l)(2ro-3)---3-l s$ 2™m!. 
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This expresses the fact that the first edge of S can be bridged with at most (2m— 1) edges, the next remaining 
edge with at most (2m — 3) edges, and so on. Therefore (7.3) and Lemma 7.4 (hi) yield 



V V Vi?m<r V 2^ M Y f^l ( 1 , 1 

Z. 2^ 2^ n ^ L > ° Z^ "Mikf-lJ MW3( TO _2)!Ui/2 + MV6 



MV3/ 1 ! w M V v / P V 

^ ° p U 1/2 MV6 J^M - 1 J 2- m U/ 1/3 / 

^ p + M 1 / 6 J \M - 1 J Z- {m 1 / 3 ) ' 



Thus, Lemma 7.2 yields 



where we abbreviated 

h n , n < := J2 2~2 V *^- 

7.7. Conclusion of the proof. In this subsection we complete the proof of Proposition 7.1 by showing that 
the error 

E w := Y J \an{riT)a n ,{r l T)\h n , nl (7.11) 

n.n' 

satisfies E w = o(l) as -> oo, uniformly in AT > VKi +d/6 . 
We begin by deriving bounds on the coefficients a n (t). 

Lemma 7.9. (i) We have 

J2K(t)\ 2 = l + 0(M"i), (7.12) 

uniformly in t g 1. 

M*)| < C 1 -. (7.13) 

PROOF. We start with (i). Write 



El f^l 2 a n+2k{t) a n +2k> 

l«nWI - (M-l) k + k ' 



a n +2k(t) a n+2 k'(t) 

The term k = k' = yields 1 by (5.4). The rest is equal, by (5.4), to 

EY" a n +2k(t) a n +2k>(t) \p \ - \a n +2k{t)\ 2 \p 1 = n ( M -i\ 

Z^ (M-l) k + k ' ^ Z^ Zv - l) fe + fe ' ^ Z^ (M-l)*^' ^ 

n^0fc+fe'>0 V ' fc+fc'>0n>0 V 7 fe+fe'>0 V ; 
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In order to prove (ii), we use the integral representation (see [32]) 



Jn(t) = 



dA e itA (l- A*)" - * . 



2\n- 



Therefore 

Moreover, (5.4) yields 
We use the estimate 



\a n (t)\ < 2 



n+l (§) 



7T t" 

t ^Fr(n + |) 2 " n\ 
\a n (t)\ sC 1. 

a n ( t )\ < x: K+2fc(<)l 



(7.14) 
(7.15) 



fe>0 



(M - l) fe 



Let us first consider the case t < n. Then it is easy to see that ^^fc)! ^ n)- Together with (7.14) this yields 

^ ^ j-n+2k jtl ^ y t n 

M*)l < S(M-l)* (n + 2k)\ n\ E (M - 1)* ^ C d ' 

If f > n we have £ ^ C. Thus the bound (7.15) yields 

C t n 



□ 



Using the new variables p '■— n — ^ip- and g := ^^-^ we find from the definition (7.11) 



E w < E X! l a P+<?( r ? T ) a p-'/( ? ? r )I^P+'J 

P>0 q=—p 



p-q ■ 



Next, we observe that Lemma 7.9 (ii) implies that terms corresponding to n, n' ^> t = r\T <~ CM K T are 
strongly suppressed. Thus we introduce a cutoff at p = M^, where k < y(i < |. Let us first consider the 
terms p ^ Af*. We need to estimate 

M M p 

^ : = E Ei^^-'WiWi 

p=0 q=—p 

!/ 2 p \ V 2 

(^P+9,P-?) I 



(M^ p \ /M M p 

]T ^ la^lX^T)! 2 E E 

p=0q=—p / \p=0 q=— p 

/M" p \ V 2 

^ C (E E ( h p+q,p-qf ) 

\p=0?=-p / 



where we used Lemma 7.9 (i). Thus, 



hp+q,p- 



p— o Lq——p 



M" 

^E 

p=2 



M 1 / 3 / 1 



M 



p p 



/ C P 



1/ E( M l/£ 



m=2 
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by (7.10). For p ^ M M and W large enough, the term in the square brackets is bounded by 



P Vp 1/2 mV V (i- ji) p \ m1/3 J ^ M1/3 ^ M1/e 



Thus we find (Efr) s$ CM"- 1 / 3 . 

Let us now consider the case p > Af*, i.e. estimate 

p 

E W '■= | a P+9( 7 ? T ) a P-«( ? ? T )| /l P+9,P-9- 

p>M^ q=—p 

By (7.13) and the elementary inequality ^ we have 

t 2 P t 2p 
\a p+q {t)a P - q {t)\ < + < <7— . 



This gives 



f> < r V (??T)2P V ft 
fc w ^ ° L „(„! 2^ n p+w- 



p'.n] z — 

v m^M^ f i i w m y * / C P 

by (7.10). Setting 77 - CM K yields 

v (cry ^ 2 / c P 

ff ^ 2- 2. I M V3 

p>M' ^ ^ m=0 v 

„ [CM K T\ 2p . (CM 2K T 2 ^ r 



< ^ (CM fi -"T) 2p + ^ (CM^-Vs-fT 2 )' 

p>M^ p>Mv 

< (CM K -»T) 2M * + (CM 2K -^-^T) M " . 

Choosing /j = 1/3 — /3 (where, we recall, 0</3<2/3 — 2k) completes the proof of Proposition 7.1. 



8. The ladder pairings 

In this section we analyse the contribution of the ladder pairings, ^2 n>0 \an(r)T)\ 2 Vx(L n ), and complete the 
proof of Theorem 3.1. (Recall that 77 := W dK is the time scale.) Recalling the expression (6.3), and noting 
that in the case of the ladder the variables x , . . . , x n determine the value of all variables x , . . . , a^n-i, we 
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readily find 

v x (L n ) = _ j2 w**„ n ^ < - ^ < ^) 

n-2 

x n i<xi ^ Xi + 2 ) n + {xj,x j+1 }) . (8.1) 

i=0 0^i<j^n-l 

Throughout this section we assume that r\ = W dK for some k < 1/3. 

We perform a series of steps to simplify the expression (8.1). In a first step, we get rid of the last product. 

Lemma 8.1. Under the assumptions of Proposition 7.1 we have 

J2\a n ( V T)\ 2 V x (L n ) = J2K(vT)\ 2 V x \n)-El, 

where 

_^ n— 1 n— 2 

V x( n ) : = f M _ n» E 5 °*o S xx n n 1 ( 1 ^ 1^+! ~ X i\^ W ) II X ^ ^ Xi + 2 ) 

1 J x £ a; +1 *=0 i=0 

and 



PROOF. For each x = (.t , . . . , x n ) € A^ +1 we write 1 = ^ p . ^ A P (x), where the sum ranges ranges 

over all partitions P of the set {0, . . . , n — 1}, and Ap(x) is the indicator function 

+1} = {xj, Xj + \}) if i and j belong to the same lump of P 
o<i<j<n-i \l({ x i' x i+i} 7^ {^ii^j'+i}) if * and j belong to different lumps of P. 

Notice that if P = P := {{0}, . . . ,{n— 1}} then Ap(x) is the last product of (8.1). Let us define 

\ n— 1 n— 2 

El{n) := - 2 W**„ LI < ~ x <\ < W )\{ ^ * x ^) E A ^ x ) ' 

Thus, by definition, we have 

Next, we estimate We begin by observing that each partition P of {0, . . . , n — 1} uniquely 

defines a partition T(P) € &nn- Indeed, each lump p e P gives rise to the lump 7 e L(P) defined by 
7 = Uie P { e ^ e 2 „-i-i}. In particular, F(P) ^ F(P') if P ^ P'. We now claim that 

. n— 1 n— 2 

(M _ 1)n e «w**„ n < - ^1 < w ) n ^ ^ ^+2) < ^( r ( p )) • 
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This can be directly read off (6.1); there is in fact an overcounting arising from the summation over 7Tr- 
Thus we find 

Ei^i < ££kwt)i 2 E < ££kwoi 2 E w)- 

x x n>0 P:{0 n-1} x n>0 r £*n n 

P^Pa 

Invoking Proposition 7.1 completes the proof. □ 

In a second step, we get rid of the second to last product in (8.1), i.e. the nonbacktracking condition. 
Lemma 8.2. For any T > we have 



where 



and 



n-l 



v * {n) := (m - 1)» £ <wx*„ n ^ i^+i - **! < W ) 



i=0 



E^ 



Proof. We find 



n-l 



C 

W2d/3 ■ 



= £m^)i 2 * E kll 1 ^!^-^^) 



(M — 1) 

The expression in the square brackets is equal to 



n-2 



i=0 



1-11(1-1(^ = ^2)) = E(-!) fe+1 E II 1 " ' 

k=l 0^ii<---<i fc ^n-2 j=l 



i=0 



Therefore summing over x yields 



1 n-2 

Ei^i < Ei««(^)i 2 (i ^E 

x n>0 

= ^|a„(r?T)| 



n^O 



(M - 1) 

M 
M — 1 



fc=i 

n r 



n-2 



M 



n— k 



1 + 



- 1 



We introduce a cutoff at n = M 1 / 3 . The part n ^ M 1 / 3 is bounded by 



£Kwni s 



n>0 



M 



M — 1 



A/ 



M 



M 1 / 3 



M~ 2/3 -\ - C 



< C(e M ~^ - 1) < 



M 2 /3 ' 



by Lemma 7.9 (i). The part n > M 1 / 3 is estimated using Lemma 7.9 (ii), exactly as in the estimate of E- 
in Section 7.7. 



> 
w 

□ 
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We summarize what we have proved so far. 
Lemma 8.3. Under the assumptions of Proposition 7.1 we have 

Y,Wn(vT)\ 2 V x (L n ) = Y,\ a n{vT)\ 2 P x {n)+E x , 

where 

^ n— 1 

Px{n) ■= — ^2 5 0xo 5 XXn J| 1(1 s$ \x i+1 - Xi\ < W) 
x£a; +1 *=o 

and 

X 

Proof. The claim follows from Lemmas 8.1 and 8.2, combined with an argument identical to the proof of 
Lemma 7.9 (i) that allows us to replace |a„(i)| 2 with |a„(t)| 2 . We replaced the factor ^ M ^^ n with -p^ by 

introducing a cutoff at n = M 1 / 3 , exactly as in the proof of Lemma 8.2. □ 

The expression P x (n) is the (normalized) number of paths in Z d of length n from to any point in the 
set x + NZ d , whereby each step takes values in {y : 1 < \y\ < W}. 

In a third step, we use the central limit theorem to replace P x (n) with a Gaussian. Recall the definition 
of the heat kernel 

G(T,X) = (^) d/2 e--^. 
Lemma 8.4. Let tp <s C h (R d ) and T ^ 0. Then we have 

^JL P M)^{w^) = J dXG ^ X ^ X ^ (8-2) 
where [•] denotes the integer part. 

Proof. Let P x {n) denote the normalized number of paths in 7L d of length n from to x, whereby each step 
takes values in {y : 1 < \y\ < W}. Then we have 

x£A N V / x£A N veZ d V ' 

x£Z d V 7 

where ir(x) is defined through tt(x) G An and x — ir(x) G N1* d . Define the sequence of i.i.d. random variables 
Ai,A 2 , . . . whose law is jj X)aez d 1(1 ^ l a l ^ W) where 5 a denotes the point measure at a. Then we 
have 

E ^Wl)^^) - ^ C'^^ 1 ' ) ■ (8-3) 
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Next, we introduce the partition 

1 = l(\A 1 + ---+A [vT] \<N/2)+l(\A 1 + --- + A [riT] \^N/2) 
in the expectation in (8.3). The second resulting term is bounded by 

|M|ooP(|Ai + ---+A [jj71 | >N/2). 
This vanishes in the limit W — >• oo by the central limit theorem, since — n= — > oo by assumption. 



The first term resulting from the partition is 

Etp 



1 ' \i(\A 1 + ... + A m \ <N/2) = Etp 



by the same argument as above. Therefore we get 

x 



A x + ---A 



[VT] 



Wl+dK/2 



= Elf 



Bi 



B 



[VT] 



VW1 



+ 0(1), 



where Bi := ^ ■ The covariance matrix of Bi is + o(l), and the claim (8.2) follows by the central 
limit theorem. □ 




0.2 0.4 




Figure 8.1: The functions / t (A), /(A) (left) and F t (X), F(X) (right). Here we chose t = 150. 

In a fourth and final step, we replace the probability distribution |a n (t)| 2 with its asymptotic distribution. 
For the following we fix some test function <p G C(,(R ). Testing against (p in Lemma 8.3 yields 



EK(??T)| 2 E^(£n)V 



n>0 



+ o 



VF d ' 3 



• (8.4) 



While the distribution |cn n (i)| 2 has no limit as t — > oo, it turns out that the rescaled distribution, 



/ t (A) := t\a [tx] (t)\ 2 , 
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converges weakly to 

4 A 2 

/(A) := 1(0 < A < 1) . 

In order to prove this, we consider the integrated distribution 



F t (X) ■■= [ dUt(0- 
Jo 



We now show that F t (X) converges pointwise to F(A) — J Q X f. See Figure 8.1 for a graph of the functions 
ft, f, Ft, F. 



Proposition 8.5. The pointwise limit 



F(X) := lim F t (A) 

t— »-oo 



F(A) = / d£- 4 - = -farcsinA-Ay/l- A 2 ) (A e [0, 1]) (8.5a) 

Jo n v 1 - £ 2 f v y 



exisis /or all X ^ and satisfies 

F(A) = 1 (A > 1) . (8.5b) 

Proof. See Appendix A. □ 

In order to conclude the proof of Theorem 3.1, we need the following result. 
Proposition 8.6. Let T > 0. Then 



^D^(^l a E^)^(wi£s^) = jf <*/(A) / d*G(AT,*MX). 

Indeed, Theorem 3.1 is an immediate consequence of Propositions 7.1 and 8.6. The rest of this section is 
devoted to the proof of Proposition 8.6. 

We begin by observing that the family of probability measures defined by the densities {/t}t^o is tight, 
so that we may cut out values of A in the range [0, 8) U (1 — 8, oo). 

Lemma 8.7. Let e > 0. Then there is a 8 > and a t ^ such that 

F(S) + l-F(l-8) < jj-i- 

and 

F t (8) + 1-F t (l-S) sc ^ 

for all t ^ to. 

Proof. By Proposition 8.5 we have that 

F(6) + l-F(l-8) -> (8.6) 

as 8 — > 0. Choose <5 > small enough that the left-hand side of (8.6) is bounded by — ■ Moreover, 
Proposition 8.5 also implies that there is a to such that 

F t (6) + 1 - F t (l - 8) < F(<5) + 1 - F(l - 5) + 
for all t ^ t - □ 
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Now by (8.4), Proposition 8.6 will follow if we can show 

^^Dl 2 ]TP K (n) J— ^) = f°dA/(A) fdXG(XT,X)<p(X) + o(l), 

»,^n t. \ / JO J 



n^O 

i.e. 



£ dA f vT (\) ^ P x ([rfT\]) <p ( wl X +dK/2 ) - /"^/W / dXG(AT,XMX) + (l). (8.7) 
Lemma 8.7 implies that in order to prove (8.7) it suffices to prove 

jT 1 ^ dA /„ r (A) ^ P X ([ V TX]) <p ( ) - jT 1 d dA/(A) | dXG(AT,X)^(X) + (l), (8.8) 
for every 8 > 0. 

Next, note that, by Lemma 8.4, the sum on the left-hand side of (8.8) converges to J dX G(XT, X) (p(X) 
for each A e [8, 1 — 8] . In order to invoke the dominated convergence theorem, we need an integrable bound 
on/ t (A). 

Lemma 8.8. Let 8 > 0. Then there is a C > such that /t(A) ^ C for all A e [8, 1 — 6] and t large enough. 
Proof. From Lemma 5.1 we get 

I I 2 i 1 2 

| «[*A] (*) | t < C\J[ tX ] + l(t)\ t. 

We estimate this using the following result due to Krasikov (see [36], Theorem 2). Setting /j, := (2z/+l)(2^+3) 
and assuming that v > —1/2 and t > M 2 ^ 3 /2, we have the bound 

, T ( .,,2 < 4 4t 2 -(2^ + l)(2^ + 5) 

1 A)l 7T (4t2- M )3/2_ M ' 

Setting zv = [tA] + 1 yields | J[t\]+i{t)\ 2 ^ j for A e (5, 1 — (5) and i large enough. This completes the 
proof. □ 

By Lemmas 8.8 and 8.4, it is enough to prove that 

fl — <5 r />l—5 f- 

J dA fr, T W J dX G(\T,X)<p(X) = J dA /(A) J dX G(XT, X) (p(X) + o(l) . (8.9) 
Let us abbreviate 

5(A) := J dX G(XT)<p(X). 

The proof of Proposition 8.6 is therefore completed by the following result. 
Lemma 8.9. Let 8 > 0. Then 



lim Z 1 'dA/ t (A).g(A) = f ' dX f(X)g(X) . 



Proof. The proof is a simple integration by parts. It is easy to check that on [8, 1 — 8] the function g is 
smooth and its derivative is bounded. We find 

f 6 dA f t (X)g(X) = f 1 6 dA F t '(%(A) = - /* ' dA F t (A)</(A) + F t (l - %(1 - <J) - F t (<J)</(5) . 
Js Js Js 

Proposition 8.5 and dominated convergence yield the claim. □ 
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9. Symmetric matrices 



In this section we describe how to extend the argument of Sections 6 - 8 to the symmetric case (2.1b). While 
in the Hermitian case (2.1a) we had 



we now have 



EH xy H yx — EH xy H xy — ■ (9-1) 



Since the distribution of H xy is symmetric, Lemma 6.1 also holds in the symmetric case. However, (9.1) 
implies that there is no restriction on the order of the labels associated with an edge. Thus we replace (6.1) 
with 

V X {T) = ^Q.txJ^fnMJfnife^y))^, (9.2) 

where 

A x (e 7 ) := Y{ 1 (£'x(e) = £ 7 ) . 

e£ 7 

Next, we define the set ^* n , as the set of lumpings & n ,n' without the complete ladder and the complete 
antiladder (see its definition below). It is easy to see that the analogue of Lemma 7.2 holds with 



R X (T) := EG-ME(nM*r)) T^T^T ■ 



It therefore suffices to estimate the contribution of pairings L <G ^* n , . We have that 

x ]J ( 1 ( x «(e) = x b(e>))l{x b{e) = x a(e , } ) + l(x a(e) = a; a(e /))l(a; 6 ( e ) = x b{e/) )j . (9.3) 

{e,e'}er 

Thus, the graphical representation of pairings has to be modified as follows. Each bridge a <G T carries a tag, 
straight or twisted, which arises from multiplying out the product in (9.3). Twisted bridges are graphically 
represented with dashed lines. 

In order to find a good notion of combinatorial complexity of pairings, we define antiparallel bridges as 
follows. Two bridges {ei,ej} and {ej + i,ej + i} are antiparallel if i + l,j + 1 ^ {0,n}; see Figure 9.1. An 
antiladder is a sequence of bridges such that two consecutive bridges are antiparallel. It is easy to see that, 
in addition to ladders whose rungs are straight bridges, antiladders whose rungs are twisted bridges have a 
leading order contribution. 

The skeleton S = S(T) of the pairing T is obtained from F by the following procedure. A pair of parallel 
straight bridges is collapsed to form a single straight bridge. A pair of antiparallel twisted bridges is collapsed 
to form a single twisted bridge. This is repeated until no parallel straight bridges or antiparallel twisted 
bridges remain. The resulting pairing is the skeleton S = 5*(F); see Figure 9.2. Thus we see that Lemma 7.3 
holds. Moreover, Lemma 7.4 holds, provided that (i) is replaced with 
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i i + 1 i + 2 




j + 2 J + 1 



Figure 9.1: Two antiparallel twisted bridges. Compare to Figure 7.1. 




r s(r) 

Figure 9.2: The construction of the tagged skeleton graph. 

(i') Each E e m , contains no parallel straight bridges and no antiparallel twisted bridges. 

Crucially, Lemma 7.7 remains valid for such tagged skeletons. This can be easily seen using the orbit 
construction of the proof of Lemma 7.7, combined with (i'). 

Next, we associate a factor D e (y,z) with each bridge a £ E. If a is straight, this is done exactly as in 
Section 7.4. If a is twisted, this association follows immediately from the definition of the antiladder. Thus 
we find that Lemma 7.6 holds. The rest of the analysis in Section 7 carries over almost verbatim; the only 
required modification is the summation over 2 m tag configurations of the bridges of E. The resulting factor 
2 m is immaterial. 

Finally, the complete ladder pairing yields (3.1). The complete antiladder is subleading, as its contribution 
vanishes unless x = 0. 



10. Delocalization: proofs of Theorem 3.3 and Corollary 3.4 

In this section we show how to derive Theorem 3.3 from Theorem 3.1, and derive Corollary 3.4 as a conse- 
quence. 

Proof of Theorem 3.3. We use an argument due to Chen [9] showing that diffusive motion implies de- 
localization of the vast majority of eigenvectors. 

Recall that P x ,i{y) '■— — x\ > t) is the characteristic function of the complement (in A N ) of the 
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^-neighborhood of x. Also, 21^, defined by 

Ki = {"£21 : pCMIII^CII <e} 

is the set of eigenvectors localized on a scale I up to an error of e. 
By diagonalizing H u , 

h" = E a «io«i, 

we have 



-Ptj> e 



-itH" u 



s$ 1 



E Px,<e- itA «^(.)^ 



+ (i + 



for any £ > 0. Next, we observe that the norm in the first term may be bounded by 1: 



E ^e" itA «^(x)C 



E 



E icwi 2 < Ek^i 2 = !• 



Thus we get 



l^e" itH 4lr s$ 1 + 



E P:E ^ e 



+ (1+0 



E i**)^ 



^ + E icwiii^cii + (i+o E icwi 2 - 



Averaging over x € Ajy yields 



i^tHII^.^^^^II 2 < ( 1 + 7)^T 1^ 5:i^(-)lll^^ll + (i + C)^T e ElC( 



c 



^ i + -U + (i + 



|2i\2i; 



|2t| ' 



by definition of 21" £ . Therefore 



.. i i V|| 
|2i| ^ i + C |a|2-ll 



e-^4|| 2 -§ 



Taking the expectation yields 



E^J^^^EEll^e-^ 



|2l| 



l 2 -^ = Y^EllPo^c-^oH 2 -^, (10.1) 
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by translation invariance. Note that this estimate holds uniformly in t. 

Next, pick a continuous function <p(X) that is equal to if \X\ < 1 and 1 if \X\ ^ 2. Recalling that 
e (t,x) = \(5 x , C - itH /H Q )\ 2 ,wc&nd 



X X 

Now choose an exponent k satisfying k < k < 1/3 and set t = W dK . Thus, 



W 1+dK l 2 



g(t,x) 



Q{W dk ,x). 



Since we have 



lim w(W d/2(k - K) X) = 1 



for X ^ and L(l, X) is continuous at X = 0, a simple limiting argument shows that Theorem 3.1 implies 



lim Vosf W d/2{k - K 



g(W dk ,x) = / dX L(1,X) = 1. 



We have hence proved that 



Plugging this into (10.1) yields 



lim EllPo^+^e-^^of = 1- 



W-too 



r • r 1 n.l 2l \ 2l e,W 1 +^/ 2 l . 1 
limmf E ^ 



i + C c 



W— 5- CO 

Setting Q — y/e completes the proof. 

Proof of Corollary 3.4. Pick an intermediate exponent k satisfying k < H < 1/3 and abbreviate 

£ := W 1+dK / 2 , 7 := W 1+dk / 2 . 
Let a € and let u e Ajy be as in (3.3). Then we find by Cauchy-Schwarz 

\x — u\ 1 7 \ f s—^ ( \\x — up 



□ 



EICWIKjCII < ^ICWI 2 exp 



^cxpj- 



< X 2 exp<; 

\x-y\^7 



\x — u\ 



\r a {y)\ 2 



exp 

\x-v\>7 



\x — u\ 



+ 5 



\x - y\ 



1 7 



\r a (y)\ 2 , 



where S > is some small constant to be chosen later. Using (a + o) 7 ^ (2a) 7 + (26) 7 we find 



x ' x.y 





|a; — u\ 


7 


h5 


k-«r 


7 


h5 


|y-«P 


1 




I 




L * 




L * 





IC(y)l • 
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Choosing 5 < 2 7 therefore yields 

(£l<(z)IIK?C||) 2 < CWe-iCW =: ^ 

Wc have thus proved that C 21 w ~. Then Corollary 3.4 follows from limw^oo £w = an d Theorem 
3.3. □ 



11. Critical pairings 

In this section we give an example family of pairings which are critical in the sense that they saturate the 
2/3 rule (Lemma 7.7). This implies that extending our results beyond time scales of order W d / 3 requires 
either a further resummation of pairings or a more refined classification of graphs in terms of their deviation 
from the 2/3 rule. 

Let k 1 and consider the skeleton pairing defined in Figure 11.1. It is a critical pairing in the sense 
that all orbits not containing the vertices 0, m consist of 3 vertices. 








k blocks 



2/1 2/2 2/3 2/4 2/3 2/2 




2/1 2/4 2/3 2/4 2/1 2/2 



Figure 11.1: A critical skeleton pairing The label of each vertex is indicated next to its vertex. 

It is easy to see that for we have 

m = 6fc + l, L(Efe) = 4fc + l. 

In particular, the 2/3 rule of Lemma 7.7 is saturated. Moreover, if t^ k satisfies l a ^ 2 for all u £ Sj. then 
the associated pairing T := Gg Sk (Efc) has a nonzero contribution T4(r) ~ i?a;(r) w M~ 2k (here, and in the 
following, we ignore any powers of W with exponent of order one) . Indeed, it is easy to check that under the 
condition £ a ^ 2 for all a the above T satisfies all nonbacktracking conditions. (In fact, it suffices to require 
that i s > 2, where ct is the bridge drawn as a vertical line in Figure 11.1.) 
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As shown in the Section 7 (see (7.13)), the coefficients a n (t) essentially vanish if n > (1 + o(l))i. Setting 
t = M K thus means restricting the summation to n, n' < M K . 

Assume, to begin with, that we adopt the strategy of Section 7 in estimating the contribution of each 
graph, i.e. we use the 2/3 rule for each skeleton pairing and the £ -^°°-type estimates from Lemma 7.5 on 
the edges of the associated multigraph. We show that the sum of the contributions of the skeleton pairings 
T, k diverges if k > 1/3. Indeed, noting that n, n' < M K implies n ^ M K , wc find that the contribution of all 
Sfe's is 

P/6 , 

E^ E i. ("-I) 

fc=l ^l+-"+^6fc=P 

where p = M K and the sum over li is restricted to £j ^ 2 for all z. Here we only sum over the pairing of 
maximal n = p (so as to obtain a lower bound), and set 6k + 1 w 6/c. Now (11.1) is equal to 



^ M 2k \ 6k ) ~ ^ I fcM V3 J 



fe = l V 7 fc = l 

which diverges as W — > oo if k > 1/3. Hence a control of the error term at time scales k larger than 1/3 
would require further resummation of such critical pairings. 

In the estimates of the preceding paragraph we did not make full use of the heat kernel decay associated 
with each skeleton bridge. For simplicity, the following discussion is restricted to d — 1 (it may be easily 
extended to higher dimensions; in fact some estimates are better in higher dimensions). Using Lemma 7.5, 
we may improve (11.1) to 

P/6 1 1 

E^ E i t g > ( 1L2 ) 

fe=i ti+-+e 6k = P v 1 2k 

this is a simple consequence of the heat kernel bound of Lemma 7.5 and the fact that each six-block of 
Sfe contains two bridges in \ (Sfe)r for which we may apply the bound (7.7b) (in which we drop the 
unimportant second term for simplicity). Now (11.2) is bounded by 

_J_ k ( P \ $£( CM*"/° \* k 
^M 2kP Uk) ~ ^U 2 /3MV3 J ' (LL6) 

k=l y ' k=l 

which is summable for n < 2/5. Note, however, that the factor fc~ 6fe from (11.1) has been replaced with 
the larger factor fc~ 4fe . Recall that the factor k~ 6k is used to cancel the combinatorics m! ~ k 6k arising 
from the summation over all skeletons. In the present example this small factor is not needed, as the family 
{Sfc} is small. It is clear, however, that a systematic application of this approach requires a more refined 
classification of skeletons in terms of how much they deviate from the 2/3 rule. One expects that the number 
of skeletons saturating the 2/3 rule is small, and that they are therefore amenable to estimates of type (11.3). 
Conversely, most of the m! skeletons are expected to deviate strongly from the 2/3 rule, so that their greater 
number is compensated by their small individual contributions. 

Finally, we mention that the upper bound (7.7), used in the ^ 1 -£°°-type estimates above, neglects the 
spatial decay of the heat kernel, i.e. that 

D e (x,y) ~r^li e -(.*-y?lt (11.4) 



3G 



for \x — y\ <C N. Thus a correct lower bound on the contribution of each skeleton graph should have taken 
into account this additional decay as well. A somewhat lenghtier calculation shows that with the asymptotics 
(11.4) the estimate (11.2) may be improved to 

P/6 k 

fc =i M2k e 1+ -+e 6k = P i= i yjefef + if if + if if + ~tifl<f + if if + if if + tiflf ' 

where we abbreviated if := ^60-i)+«- It i s n °t hard to sec that the resulting bound is the same as (11.3), 
with a smaller constant C. In other words, the gain obtained from the spatial decay of the heat kernel is 
immaterial, and the ^-^-cstimates cannot be improved. 

In conclusion: Our estimates rely on an indiscriminate application of the 2/3 rule to all skeleton pairings; 
going beyond time scales of order W d ^ 3 would require either (i) a refined classification of the skeleton pairings 
in terms of how much they deviate from the 2/3 rule, combined with a systematic use of the bound (7.7b) on 
all bridges in S\ St; or (ii) a further resummation of graphs in order to exploit cancellations. The approach 
(i) can be expected to reach at most times of order W 2 ^ 5 for d = 1. 



A. Proof of Proposition 8.5 



Note first that F is monotone nondecreasing and satisfies < F(X) < 1, as follows from (5.4). Hence it is 
enough to prove (8.5a) for A G (0, 1). 

For the following it is convenient to replace F t with F t , defined by 

[At] 



n=0 V 1 / 



By Lemma 8.8 we have F t {X) — F t (\) = o(l) as t — > oo. 
Thus let A e (0, 1) be fixed. From (5.2) we find 



[At] 



where we used (5.1). Thus, 



[At] 

E 

n=0 



2 r 



dd sin#sin[(n + 1)0] e' 



-it cos 9 



7T P IX 



Jo 



Ft(X) = ~~7t I d0 / d0' sin0 sin0'c it ( cosfl - cose ') 



c i([At]+i)(e+e') _ x e _i([At]+i)(e+e') _ 1 e i([At]+i)(e-e') _ 1 e -m\t]+i)(e-e') _ 1 



e -i{6+6') _ i 



+ 



c i(9+8>) _ l 



e -i(0-8>) _ 1 



e i(e-e') _ i 



(A.l) 



We now claim that the limit t — > oo of the first two terms of (A.l) vanish by a stationary phase argument. 
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Let us write the first term of (A.l) as R\ + where 



R 



ae 



d6' sine sm8 / e it< - cos9 - cose " > 



ne i([\t]+i)(e+e') 



de 



--: f d0 f d6' a t {6,6')d 
Jo Jo 



e -i(0+«") - l 
t<t>(e,e') 



e -i(e+e>) _ i 

it(cos S-cos B' + \6+Xe') 



where {£} := £ — [£] e [0, 1). One readily finds the bounds 



inf \V<j>{6,e')\>\, 
e,e'€[o,7r] 



sup |V 2 0(M')l < °°. SU P SU P |Va t (0,0')l < °°- 



e,e'G[o,x] 



* e,e'e[o,ir] 



A standard stationary phase argument therefore yields lim^oo R\ — 0. 
Similarly, we find 

R2 _ 1 M r H fl/ it(cose-cose') sing sin ^ 

= :b(6,0') 

As above, the functions 6 and V6 are bounded on [0, 7r] 2 . The phase cos — cos 0' has four stationary points, 
(0, 0), (0, 7r), (7r, 0), (7r, 7r), all of them nondegenerate. Therefore a standard stationary phase argument implies 
that R\ = 0(i~ 1/2 ). (Note that the stat ionary points lie on the boundary of the integration domain. This 
is not a problem, however, as the usual stationary phase argument may be applied in combination with the 
identity dxe ltx = 0(i -1 / 2 ).) Similarly, one shows that the second term of (A.l) vanishes as t — >• oo. 
Next, as we have just shown, we have 

F t (X) = i?°(A) + F+(A) + Ff(X) + o(l) 

for t — > oo, where 

FtW ■= I M f d6' sin (9 sin 6>' c" 
i" Jo Jo 

F±(A) := ^ / d6 *W d(9 ' sin<9 sinfl'c" 

n Jo Jo 



£(cos 9— cos 



1 



+ 



i(e-e') _ i 1 e i(9-e') _ i 



e ±i([At]+i)(e-e') 

(cos9— cos 6 ) c 



c Ti(e-9>) _ i ' 

where "P denotes principal value. We now show that F®(X) = o(l). Indeed, the expression in square brackets 
in the definition of iy(A) is equal to — 1. Exactly as above we therefore conclude that Ff(X) = 0(t -1 / 2 ). 



Next, let us consider F t + (A). In a first step, we replace the factor c _ i(9 i a / ) _ 1 with _^g_ g ^ ■ The error 



is 



~ f deV [ de' sine sintf'e" 
n Jo Jo 



(cose-cose')„i([At]+i)(e-e') 



e -i(e-0')-l -i(e-e') 



which vanishes in the limit t — >• oo by the above saddle point argument (the expression in the square brackets 
is an entire analytic function, and the phase cos — cos 9' + Xe — A6" has the four nondegenerate saddle points 
defined by sin 6 = sine' = A). 



; J ,8 



In a second step, we choose a scale 2/5 < e < 1/2 and introduce a cutoff in \6 — 6'\ at t £ . Thus we have 
1 r r ■ , „ / ,„pi([At]+i)(e-0') 



F+(A) + o(l) = / ddV [ dO' 1(\6 - d'\^t- e )sm6 sm0' e ^e-co S 0')[ 
n Jo Jo 



i(0 - 6') 

i([At]+l)(0-0') 



+ ^/ d<9 / d^' 1(|6»- 0'| > t- £ )sin6' sin 6^' e lf ^ cos e ~ cos ) — — — — — , (A.2) 
^ Jo Jo ~ " ) 

Let us abbreviate D t := {{0,6') e [0,7r] 2 : |0-0'| > t~ e }. The second term on the right-hand side of (A.2) 
is equal to 

\ ( dOdO> ainy-W)(^') =; /" ^ ^ o^) _ 

ti" 2 JD t - # ) JD t 6-0 

In the domain D t the phase has two stationary points defined by sin# = sin#' = A and 0^0'. For all 
(6, 9') not in some fixed neighbourhood of these stationary points and satisfying \0 — 0'\ > t~ £ , we have the 
bound 

|V<HM)I > cr £ , 

for some constant C > depending on A, and large enough t. Thus a standard saddle point analysis shows 
that (A.3) is of the order r 1 / 2 + t 2 ^ 1 = o(l). 

In a third step, we analyse the first term on the right-hand side of (A.2). We introduce the new coordinates 

u= e -Af, v = o-e>, 

and write 

F+(A)+o(l) = -z \ dOV / dO' l(\0-9'\^ t- £ ) sinO sinO' e lt( - cos9 - cos0) - 
n Jo Jo 



i r r at ^ ( v\ ( 

= — / duV dv sin u H — sin 

Jo J V 2/ V 



i(0 - 0') 

., , „it(Av— 2 sin u sin 77) 

u _M e i(l-{At})^ >J_ 



2 iv 



where 

a t , u '■= min{t~ £ ,2u, 2(ir - u)} . 
Now we replace the factor e it ( A, '- 2sinusin f ) with e ito ( A_sin ") . The resulting error is 



1 /' 7r f a t,u / „.\ / „. \ pit sin u (v— 2 sin J) _i 

R t — J duV J dv sin^ + -jsin^--je^>)vMA-n^ _ (A4) 



-at 

It is easy to check that, for v G [— a tjM , Ot, u ], we have 

pit sin u (f — 2 sin ^) ^ 

iw 



Therefore 



/■7T /*27r i 

\Rt\ < C / dt* / d!)* 1 -? 6 — = o(l). 
Jo Jo 
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Thus we may write 

F+(A)+o(l) = j duVj dv sm^u+ -Jsmi^u- -jc i< - 1 -^ v - 



w 



= :/*(«) 

In a fourth step, we analyse h{u) using contour integration. Abbreviate b := A — sinw. Let us assume 
that u satisfies b ^ 0. Then, setting z = |6|tu, we find 



Uu) = V [ lbltat ' U dz s iJu+^)siJu- ^)c [(1 - {xt}) i^^^ 



r\b\ta tt 
-\b\ta 

Let us consider the case b > 0. Using the identity 



■P- = inS(v) + 



v v — iO 

and Cauchy's theorem, we find 



z \ B i(l-{At})f e ' 



sin m ;- c' 11_1 ""m 

26t / iz 



If(u) = 7rsin 2 (u) + J dz sin^u+^-Jsh 
where 7 is the arc {&to tjM (cos yj, sin ip) : ip e [0, 7r]}. The absolute value of the integral is bounded by 

e «t,u sinip e -bta t u sin tp 



which is bounded uniformly in f and 6 ^= 0, and vanishes in the limit t — > 00 for all 6^0. The case 6 < is 
treated in the same way. In summary, we have, for each u satisfying sinu 7^ A, that 

|/ t (w)| ^ C, lim It{u) = 7rsin 2 ('u) sgn(A — sinii) . 

t— >oo 

Hence by dominated convergence we get 

~ 1 f n 

lim F t + (A) = — du sin 2 (w) sgn(A — sinu). 

A similar (in fact easier) analysis yields 

~ If" 
lim F t _ (A) = — du sin 2 (u) sgn(A + sinu). 



'0 

Therefore we get 



~ 2 f* Aft 

lim F t (A) = - / du sin 2 (w) l(sinu < A) = - / dj , 

t^oo 7T J Q 7T J ' y/ 1 - £2 

This completes the proof of Proposition 8.5. 
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